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Three experiments using human participants varied the distribution of point-gain reinforcers or point- 
loss punishers in two-alternative signal-detection procedures. Experiment 1 varied the distribution of 
point-gain reinforcers for correct responses (Group A) and point-loss punishers for errors (Group B) 
across conditions. Response bias varied systematically as a function of the relative reinforcer or punisher 
frequencies. Experiment 2 arranged two conditions - one where an unequal ratio of reinforcement (5:1 
or 1:5) was presented without punishment (R-only), and another where the same reinforcer ratio was 
presented with an equal distribution of point-loss punishers (R+P). Response bias was significantly 
greater in the R-only condition than the R+P condition, supporting a subtractive model of punishment. 
Experiment 3 varied the distribution of point-gain reinforcers for correct responses across four unequal 
reinforcer ratios (5:1, 2:1, 1:2, 1:5) both without (R-only) and with (R+P) an equal distribution of point- 
loss punishers for errors. Response bias varied systematically with changes in relative reinforcer 
frequency for both R-only and R+P conditions, with 5 out of 8 participants showing increases in 
sensitivity estimates from R-only to R+P conditions. Overall, the results indicated that punishers have 
similar but opposite effects to reinforcers in detection procedures and that combined reinforcer and 
punisher effects might be better modeled by a subtractive punishment model than an additive 
punishment model, consistent with research using concurrent-schedule choice procedures. 
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Many situations require organisms to dis- 
criminate between stimuli that signal different 
consequences. For example, a bee must decide 
whether a plant’s pollen is toxic or safe, or a 
pedestrian must decide whether or not it is 
safe to cross the road. In these examples, both 
the positive consequences arising from correct 
choices and the negative consequences arising 
from errors affect the choices that are made. 

Signal-detection tasks (also known as condi- 
tional discriminations) are often used to study 
choice and stimulus discriminability. This is a 
discrete-trial procedure where, on each trial, 
the subject is presented with one of two 
discriminative stimuli (Si or S 2 ) that vary on 
some dimension (e.g., intensity or color). The 
subject then chooses between two response 
alternatives (B| or B 2 ), where Bi is the correct 
response following an Si presentation, and B 2 
is the correct response following an S 2 
presentation. I>| and B 2 are usually physical 
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responses, such as left or right key pecks or 
lever presses. With two stimulus types (Si and 
S 2 ) and two response options (Bi and B 2 ), 
there are four possible response outcomes 
(Figure 1): Bn (responding Bi following Si) — 
a correct response, Bj 2 (responding B 2 follow- 
ing Si) — an error, B 2 i (responding Bi follow- 
ing S 2 ) — an error, and B 22 (responding B 2 
following S 2 ) — a correct response. Often, 
correct responses (Bn and B 22 ) are reinforced 
(e.g., money: Johnstone & Alsop, 2000; food: 
McCarthy & Davison, 1979; brain stimulation: 
Terman, 1970) while errors (B 12 and B 21 ) have 
no consequence. 

Behavioral models of signal-detection per- 
formance (e.g., Alsop, 1991; Davison, 1991; 
Davison & Nevin, 1999; Davison & Tustin, 
1978) arose from the generalized matching 
law (GML: Baum, 1974) which describes how 
behavior is allocated across two concurrently 
available response alternatives when each 
alternative is associated with its own schedule 
of reinforcement. The GML can be written 

log (t) - “ l 08 (j|) + logc ’ (1) 

where Bi and B 2 are the number of responses 
made on Alternatives 1 and 2 respectively, and 
Ri and R 2 are the numbers of obtained 
reinforcers for B| and B 2 responses respective- 
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Fig. 1. A 2 X 2 matrix illustrating the four possible 
response outcomes in a two-alternative signal-detection 
task. 

ly. Equation 1 is in the form of a straight line 
with slope a and intercept of log c. The 
parameter a is the sensitivity of the subject’s 
behavior to the distribution of reinforcers, 
and measures the extent to which changes in 
the reinforcer distribution ( R | / R 2 ) produce 
changes in the response distribution (Bj/Bv). 
The parameter log c measures any inherent 
bias in the subject’s behavior towards making 
Bj or B 2 responses, irrespective of the rein- 
forcer distribution. Inherent bias is often 
attributed to undetected asymmetries in the 
apparatus (e.g., one key requires less force to 
peck than the other) or the subject (e.g., color 
or side preferences) (Baum, 1974). 

The most widely-used behavioral descriptor 
of signal-detection performance is Davison and 
Tustin’s (1978) GMT-based model. They pro- 
posed that when two stimuli (Si and S 2 ) are 
indistinguishable, the distribution of responses 
across the two response alternatives (Bi/B 2 ) 
should depend on the relative distribution of 
reinforcers for the two alternatives (Ri/R 2 ) in 
the manner of the GML (Equation 1). Howev- 
er, once the stimuli become more distinguish- 
able, behavior also becomes biased towards 
making correct (Bn and B 22 ) responses. Choice 
in detection tasks is described on Si trials by 

l°g(^) = + lo S c + lo g^ ( 2 ) 


and on S 2 trials by 

l0g (f^) = al ° 8 (^) + logc ~~ logd ’ 


where Bn, Bi 2 , B 2 i, and B 22 , a, and log care as 
above, and Rn and R 22 are the numbers of 
reinforcers obtained for correct Bn and B 22 
responses respectively. The parameter log d 
measures discriminability between the two 
stimuli, Si and S 2 . When log d = 0, the stimuli 
are not discriminated and Equations 2 and 3 
reduce to the GML. As discriminability (log d) 
increases, subjects make more Bi responses 
following Si (Bn) and more B 2 responses 
following S 2 (B 22 ); hence, log d is additive in 
Equation 2 and subtractive in Equation 3. 

Algebraic subtraction and addition of Equa- 
tions 2 and 3 allows separate calculation of 
point estimates of discriminability and bias. 
Algebraic subtraction provides a bias-free 
measure of discriminability: 

\ogd = 0.5 log (4) 


where all notation is as above. Algebraic 
addition of Equations 2 and 3 provides a 
discriminability-free measure of response bias: 


log b 


0.5 log 


\B12B22) 



+ log c 


( 5 ) 


where all notation is as above. Equation 5 
states that response bias (log b) incorporates 
both reinforcer effects and inherent bias (log 
c) , as described by the GML. 

Davison and Tustin’s (1978) behavioral mod- 
el of signal detection has described choice 
behavior well when relative reinforcer frequen- 
cies or magnitudes are varied (e.g., Boldero, 
Davison, & McCarthy, 1985; McCarthy & Davi- 
son, 1979). The model also predicts an inde- 
pendence between its parameters, for example, 
changes in the distribution of reinforcers (Rn/ 
R 22 ) should not produce systematic changes in 
discriminability (log d ) , and changes in discrim- 
inability should not affect sensitivity to the 
reinforcer distribution (a). However, there is 
conflicting evidence regarding whether these 
assumptions of independence are met (see 
Alsop & Porritt, 2006; Johnstone & Alsop, 
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1999). Despite these limitations, the model is 
still widely used for detection and matching-to- 
sample data analyses. 

Davison and Tustin’s (1978) model and 
subsequent research (see Davison & McCarthy, 
1988, for a summary) has focused almost 
exclusively on the effects of varied reinforcer 
contingencies. In contrast, the effects of 
punishers for errors have received relatively 
little attention (but see Galanter & Holman, 
1967; Hume & Irwin, 1974; Wright & Nevin, 
1974). Hume and Irwin investigated the effects 
of varied punisher (time-outs) contingencies 
using a detection procedure with rats but 
found little effect of varied relative time-out 
durations on response bias. Galanter and 
Holman varied both relative monetary gains 
and losses and found participants were biased 
towards responding on the alternative associ- 
ated with the greater monetary gain and the 
smaller monetary loss. Finally, Wright and 
Nevin varied the intensity of shock punish- 
ment on one alternative then increased the 
frequency of reinforcement for that alternative 
and found changes in the location (but not 
the slope) of the bias function. The lack of 
punishment research with detection proce- 
dures is of concern because, like reinforcers, 
punishers are common in many real-world 
detection tasks (e.g., toxic pollen might kill 
the bee). Ideally, any model of detection 
should describe both the effects of reinforce- 
ment for correct responses and the effects of 
punishment for errors. 

To incorporate punishment into a detection 
model, it seems obvious to examine how the 
effects of punishers and reinforcers are mod- 
eled in standard concurrent schedules. There 
are two main competing models — an additive 
model (e.g., Deluty, 1976) and a subtractive 
model (e.g., de Villiers, 1980; Farley & 
Fantino, 1978). The additive model proposes 
that the effects of punishment on one 
response alternative add to the effects of 
reinforcement on the other alternative, while 
the subtractive model proposes that the effects 
of punishment directly subtract from reinforc- 
er effects on the same alternative. Few studies 
have investigated the predictions of these 
models, but there appears to be more empir- 
ical support for the subtractive model (Critch- 
field, Paletz, MacAleese, & Newland, 2003; de 
Villiers, 1980; Farley, 1980; Farley & Fantino, 
1978) than the additive model (Deluty, 1976). 


Both models are readily incorporated into 
Davison and Tustin’s (1978) GML-based mod- 
el of signal detection (Equations 2 and 3). 
When correct responses are intermittently 
reinforced and errors are intermittently pun- 
ished, the additive punishment version (e.g., 
Deluty, 1976) of Davison and Tustin’s model 
is, following Si presentations, 



a log 


-Rn + qPvi\ 

R& + qP n ) 


( 6 ) 


+ log c + log d, 


and following S 2 presentations, 

log^ = ~i + ( < Pn 


U 22 J al ° g \Rrz + qPn 
+ log c — log d. 


(?) 


with response bias calculated as 

1086 - 05 106 (S) 


= a log 


fR\\ + qPvz\ 

\Rr> + qPn) 


+ logc. 


(8) 


Notation is as above, but now P 1 2 and P 2 i are 
the numbers of obtained punishers for incor- 
rect B 12 and B 21 responses respectively, and ^ is 
a scaling parameter used to equate the value of 
one punisher relative to one reinforcer (e.g., if 
q = .5, then a punisher would be half the 
perceived value of a reinforcer) . In Equations 
6 to 8, the effects of punishers obtained on 
one response alternative (e.g. > P 12 for incor- 
rect B 2 responses) add to the effects of 
reinforcers obtained for the other response 
alternative (e.g., Rn for correct B! responses). 

Likewise, a subtractive punishment version 
(e.g., de Villiers, 1980; Farley, 1980) of Davison 
and Tustin’s (1978) model can be written, 
following Si presentations 




qP 2 A 

q p n) 


( 9 ) 


+ log c + log d, 


and following S 2 presentations by 

log(^) = alog( Rn ~ qPn 

\Rn ~ qPn 

+ log c - log d. 


( 10 ) 
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with response bias calculated as 

^n^i\ 


log b = 0.5 log 


= rtlog 


R\ 1 — qP‘i\ 

R r > - qP n 


( 11 ) 


+ log c, 


where all notation is as above. In Equations 9, 
10, and 11, the effects of punishers obtained 
on one response alternative (e.g., P 2 i for 
incorrect Bj responses) subtract from the 
effects of reinforcers obtained on the same 
response alternative (e.g., Rn for correct Bj 
responses). Note that Equations 9, 10, and 11 
are undefined if is greater that Rn or r/P 12 
is greater than R 22 . 

Figure 2 illustrates bias predictions made by 
the additive (Equation 8) and subtractive 
(Equation 11) punishment versions of Davison 
and Tustin’s (1978) model under two different 
reinforcer and punisher arrangements. In the 
first arrangement (Figure 2, top), relative 
punisher frequency was varied from 1:11 to 
11:1 (variable interval [VI] 60 s:VI 5.5 s to VI 
5.5 s:VI 60 s) with a constant and equal (VI 
3 s:VT 3 s) background rate of reinforcement. 
Figure 2 (top) shows that the additive (dotted 
line) and subtractive (dashed line) models 
predict systematic biases away from the more 
punished alternative (i.e., negatively sloping 
functions) with the subtractive model predict- 
ing slightly more extreme response biases than 
the additive model. 

Figure 2 (bottom) shows the predictions of 
both models when relative reinforcer frequency 
was varied (7:1 to 1:7) with a constant and equal 
(1:1) background rate of punishment. A 
reinforcer-only baseline, where the relative 
reinforcer frequency was varied (7:1 to 1:7) 
without any punishment for errors, is also 
shown for comparison (solid line). When 
subjects received a constant and equal rate of 
punishers for errors, the additive and subtrac- 
tive models make different predictions. The 
additive model (dotted line) predicts a shal- 
lower function than the reinforcer-only condi- 
tions; that is, it predicts a reduced preference 
for the more reinforced alternative. The sub- 
tractive model (dashed line) predicts a steeper 
(and nonlinear) function than the reinforcer- 
only conditions; that is, it predicts an increased 
preference for the more reinforced alternative. 

The present experiments examined the 
effects of punishment for errors in detection 
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Fig. 2. Predictions made by punishment versions of 
Davison and Tustin’s (1978) GML-based model of signal 
detection. The effects of varied punisher ratio (top) and 
reinforcer ratio (bottom) on response bias (log b) are plotted 
for additive model predictions (dotted lines) , and subtractive 
model predictions (dashed lines), when a = .9, log c - 0. and 
q = 1. Figure 2 (bottom) also plots the predicted changes in 
response bias when the relative reinforcer ratio is varied 
without punishment for errors (solid line) . 


procedures using human pardcipants. Histor- 
ically, the most commonly used punisher for 
nonhuman subjects in behavioral experiments 
was electric shock (Baron, 1991). Due to 
ethical constraints associated with using hu- 
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man participants however, response cost was 
chosen as the punisher type for the present 
experiments. Response cost has been an 
effective aversive stimulus in both basic (Cros- 
bie, 1998) and applied (Lerman & Vorndran, 
2002) settings, and is defined as the contin- 
gent removal of conditioned reinforcers, such 
as points (Weiner, 1962, 1963) or money 
(Critchfield, et al., 2003). In the present 
experiments, reinforcers were point gains 
and punishers were point losses. These points 
were exchangeable for reduced session time 
(i.e., point losses resulted in increased session 
time) and Experiment 1 investigated whether 
these were effective reinforcers and punishers 
for human participants. Experiments 2 and 3 
examined which of the two competing models 
(additive or subtractive) was a better descrip- 
tor of choice in detection procedures. 

EXPERIMENT 1 

Experiment 1 used a perceptual discrimina- 
tion task where participants judged whether 
stimulus arrays contained more blue or yellow 
objects (e.g., Johnstone & Alsop, 1996, 2000). 
Two groups of participants were used — Group 
A examined the efficacy of point-gain reinforc- 
ers while Group B examined the effects of 
point-loss punishers. For Group A, the ratio of 
reinforcers for correct responses (R| | :R 2 a) 
varied across four conditions (5:1, 2:1, 1:2, 
and 1:5) with no punishers for errors. It was 
predicted that Group A participants would be 
systematically biased towards responding to the 
more reinforced alternative, consistent with the 
GMT (Equation 5) and previous human (e.g., 
Alsop, Rowley, & Fon, 1995; Johnstone & Alsop, 
1996) and nonhuman (e.g., McCarthy & 
Davison, 1979) detection research. For Group 
B, the ratio of punishers for errors (P 2 i:Pi 2 ) 
varied across four conditions (5:1, 2:1, 1:2, and 
1:5) against a background of a 1:1 reinforcer 
ratio for correct responses. It was predicted that 
participants in Group B would be systematically 
biased away from responding to the more 
punished alternative (Equations 8 and 11, and 
Figure 2, top). 

Method 

Participants 

Undergraduate students at the University of 
Otago participated as part of an optional piece 


of assessment. In Group A, there were 1 male 
and 5 females aged between 18 to 19 years (M 
= 18.3 years). In Group B, there were 3 males 
and 3 females aged between 18 to 21 years (M 
= 19.0 years). 

Apparatus 

The experiment was conducted in a room 
approximately 2.3 m X 3.0 m. A computer ran 
the tasks and recorded the participants’ 
responses using a program written in Micro- 
soft VisualBasic™ 6.0. Stimuli and instructions 
were presented on a standard 38 cm (15") 
color monitor. Stimuli were 10 X 10 arrays 
(129 mm wide X 138 mm high) in the center 
of a white screen with each position of the 
array occupied by either a blue “greeblie” or 
yellow “greeblie” (i.e., alien cartoon charac- 
ters). Each greeblie was approximately 10 mm 
wide by 12 mm high against a white back- 
ground. Stimuli classified as “more blue” 
consisted of at least 52 array positions filled 
randomly with blue greeblies and no more 
than 48 array positions filled with yellow 
greeblies. Stimuli classified as “more yellow” 
had at least 52 yellow greeblies and no more 
than 48 blue greeblies. As described below 
(Procedure), the final proportions of blue and 
yellow greeblies depended on each partici- 
pant’s performance. 

Participants responded by clicking the com- 
puter mouse over one of two response 
“boxes” presented on the computer screen 
1.5 cm under each stimulus array and 7 cm 
apart from one another. Each response box 
was 4 cm wide by 1.5 cm high. The left and 
right boxes were colored and labeled “blue” 
and “yellow” respectively. An arrow-shaped 
cursor indicated the virtual position of the 
computer mouse on the screen. Figure 3 
shows an example of a stimulus array with 
the responses boxes presented below. 

Procedure 

All participants attended four experimental 
sessions (one condition per session), no less 
than 24 hours apart and no more than one 
week apart. The order of conditions was 
partially counterbalanced across participants 
(Table 1). Participants read an information 
sheet which briefly described the experiment 
and signed an informed consent form before 
the start of the first session. They were then 
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Fig. 3. An illustrative example of a “more blue” 
stimulus array with the response buttons presented during 
each trial in Experiment 1. 


seated with their heads approximately half a 
meter away from the computer screen. 

Group A. The following set of instructions 
was presented on the computer screen at the 
start of each session. Participants advanced 
screens using the computer mouse to click the 
“next screen” button located at the bottom 
left corner of the screen. 

Screen 1 : “Hi, this is a simple computer game. You 
will see some patterns of blue greeblies and yellow 
greeblies. You must decide if there are more blue ones 
or yellow ones, and then press the blue or yellow 
button. Here is an example of a pattern. ” 

Screen 2: “If there are more blue greeblies, press the 
blue button. ” An example array showing more 
blue greeblies was presented. 

Screen 3: “If there are more yellow greeblies, press 
the yellow button. ” An example array showing 
more yellow greeblies was presented. 

Screen 4: “Sometimes when you are correct you will 
gain a point. Sometimes nothing happens, you might 
be correct or urrong. When you get 70 points, the 
session unll end and you can go!” 

Screen 5: “As you go, a red bar (like that on the 
right) will show you how close you are to finishing the 
experiment. When the red bar gets to the top, you can 
go!” A vertical thermometer bar was presented 
on the right side of Screen 5. 

Screen 6: “Any questions ? If not, you are ready to 
start the session.” The “Begin Experiment” 
button appeared. 


Each trial began with a 15 mm X 15 mm 
animated picture of a juggler (warning stimu- 
lus) in the middle of the screen for 1 s. A 
stimulus array (containing either more blue or 
yellow greeblies) and the two response boxes 
then appeared. The array remained on screen 
until the participant clicked on a response 
box, or for a maximum of 3 s. If the 
participant had not responded after the 3-s 
stimulus presentation, the array disappeared 
and the response boxes remained on the 
screen until the participant clicked one of 
them. The response boxes then disappeared. 

Following each response, there were two 
possible consequences. If a reinforcer had not 
been scheduled for that response, the screen 
went blank for 1 s (i.e., no consequence), 
followed by a 1-s intertrial interval (ITI). A 
“next trial” button then appeared in the 
center of the screen. A click on the button 
started the next trial. 

If the participant made a correct response 
(B n or B 22 ) and a reinforcer was scheduled for 
that response, the statement: “Correct! You are 
one point closer to finishing the session”, appeared 
on the center of the screen for 2 s. This was 
accompanied by a 1-s “ta da!” sound and a 
thermometer bar appeared on the right side of 
the screen. The bar was divided into 70 blank 
spaces (the number of points required to exit 
the session). Each time the participant ob- 
tained a point, one space of the bar was filled 
in red, indicating that the bar had gone up. A 
1-s ITI then followed, the “next trial” button 
appeared on the screen, and the participant 
clicked the button to start the next trial. 

The stimulus presentation probability (SPP) 
was set at .5 throughout the experiment; that is, 
on any trial, the participants were equally likely 
to be presented with a stimulus array containing 
“more blue” or “more yellow” greeblies. The 
difficulty level of the discrimination was titrated 
for each participant to make accuracy levels 
across participants more equal. Each session 
began with 56 greeblies of one color and 44 
greeblies of the other color (56:44) . After the 
20 th trial, the computer program analyzed 
performance over the last 16 trials. If the 
percentage of correct responses was greater 
than 90% across the 16 trials, the proportions 
were made more equal by a subtraction factor 
of 2 (e.g., a 56:44 distribution was reduced to 
54:46). If the percentage of correct responses 
was between 70% and 90% for the previous 16 
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The numbers of B n , B 12 , B 21 , and B 22 responses, R n and R 22 reinforcers, P 21 and P 12 punishers, 
and estimates of discriminability (log d) and response bias (log b) calculated across the last 120 
trials for each participant in each condition in Group A (varied reinforcer ratios) and Group B 
(varied punisher ratios) of Experiment 1. The more reinforced (Group A) or punished (Group 
B) alternative is presented in bold and underlined for each condition. 


Part. 

Cond. 

Order 

Bn 

Bi 2 

B 2I 

B 22 

Rn 

r 22 

P 2 i 

Pl2 

log d 

log b 

DN 

5:1 

i 

54 

7 

22 

GROUP A 

37 

27 

5 

0 

0 

0.56 

0.33 


2:1 

3 

52 

9 

18 

41 

24 

12 

0 

0 

0.56 

0.20 


1:2 

4 

40 

21 

6 

53 

11 

21 

0 

0 

0.61 

-0.33 


1:5 

2 

25 

34 

11 

50 

4 

27 

0 

0 

0.26 

-0.40 

EW 

5:1 

2 

50 

9 

22 

39 

28 

7 

0 

0 

0.50 

0.25 


2:1 

4 

44 

16 

24 

36 

22 

8 

0 

0 

0.31 

0.13 


1:2 

3 

28 

30 

19 

43 

11 

18 

0 

0 

0.16 

-0.19 


1:5 

1 

32 

28 

10 

50 

5 

24 

0 

0 

0.38 

-0.32 

GJS 

5:1 

3 

46 

15 

36 

23 

23 

4 

0 

0 

0.15 

0.34 


2:1 

1 

50 

10 

24 

36 

23 

10 

0 

0 

0.44 

0.26 


1:2 

2 

34 

26 

22 

38 

9 

20 

0 

0 

0.18 

-0.06 


1:5 

4 

30 

30 

11 

49 

5 

25 

0 

0 

0.32 

-0.32 

KP 

5:1 

4 

45 

17 

19 

39 

22 

6 

0 

0 

0.37 

0.06 


2:1 

2 

41 

20 

18 

41 

23 

11 

0 

0 

0.33 

-0.02 


1:2 

1 

35 

23 

13 

49 

11 

21 

0 

0 

0.38 

-0.20 


1:5 

3 

42 

18 

28 

32 

4 

19 

0 

0 

0.21 

0.15 

SLJ 

5:1 

1 

38 

22 

34 

26 

20 

5 

0 

0 

0.06 

0.18 


2:1 

3 

48 

12 

17 

43 

24 

13 

0 

0 

0.50 

0.10 


1:2 

2 

24 

39 

9 

48 

8 

17 

0 

0 

0.26 

-0.47 


1:5 

4 

40 

21 

12 

47 

5 

28 

0 

0 

0.44 

-0.16 

SC 

5:1 

3 

44 

15 

24 

37 

24 

6 

0 

0 

0.33 

0.14 


2:1 

2 

48 

13 

15 

44 

18 

11 

0 

0 

0.52 

0.05 


1:2 

4 

31 

30 

18 

41 

8 

17 

0 

0 

0.19 

-0.17 


1:5 

1 

30 

30 

10 

50 

6 

25 

0 

0 

0.35 

-0.35 

DLG 

5:1 

4 

42 

19 

13 

GROUP B 

46 

16 

11 

9 

2 

0.45 

-0.10 


2:1 

2 

25 

35 

5 

55 

17 

13 

4 

1 

0.45 

-0.59 


1:2 

1 

52 

8 

10 

50 

19 

20 

2 

4 

0.76 

0.06 


1:5 

3 

51 

9 

17 

43 

17 

15 

2 

7 

0.58 

0.18 

HLB 

5:1 

3 

39 

20 

14 

47 

15 

15 

10 

2 

0.41 

-0.12 


2:1 

1 

36 

24 

10 

50 

14 

12 

5 

3 

0.44 

-0.26 


1:2 

4 

32 

31 

16 

41 

14 

12 

5 

11 

0.21 

-0.20 


1:5 

2 

52 

8 

20 

40 

16 

15 

0 

6 

0.56 

0.26 

JA 

5:1 

1 

50 

9 

13 

48 

19 

21 

8 

1 

0.66 

0.09 


2:1 

3 

41 

19 

11 

49 

18 

18 

9 

4 

0.49 

-0.16 


1:2 

2 

41 

19 

9 

51 

17 

18 

4 

9 

0.54 

-0.21 


1:5 

4 

48 

12 

15 

45 

20 

17 

2 

7 

0.54 

0.06 

JH 

5:1 

3 

45 

15 

5 

55 

19 

16 

4 

0 

0.76 

-0.28 


2:1 

1 

41 

19 

18 

42 

16 

14 

9 

4 

0.35 

-0.02 


1:2 

2 

44 

14 

23 

39 

15 

13 

5 

10 

0.36 

0.13 


1:5 

4 

55 

6 

23 

36 

16 

13 

0 

3 

0.58 

0.38 

KMC 

5:1 

2 

37 

22 

20 

41 

16 

12 

10 

2 

0.27 

-0.04 


2:1 

4 

36 

25 

22 

37 

12 

13 

9 

3 

0.19 

-0.03 


1:2 

3 

40 

19 

13 

48 

15 

17 

3 

8 

0.45 

-0.12 


1:5 

1 

42 

18 

22 

38 

14 

16 

2 

12 

0.30 

0.07 

PN 

5:1 

1 

33 

28 

7 

52 

13 

15 

6 

0 

0.47 

-0.40 


2:1 

3 

31 

28 

12 

49 

13 

15 

7 

5 

0.33 

-0.28 


1:2 

4 

40 

19 

12 

49 

12 

16 

3 

6 

0.47 

-0.14 


1:5 

2 

38 

23 

12 

47 

16 

15 

2 

15 

0.41 

-0.19 
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trials, the proportions were made more equal 
by a subtraction factor of 1 (e.g., 56:44 became 
55:45). If the percentage of correct responses 
was between 60% and 70%, the proportions 
remained the same. If the participant received 
less than 60% correct, then the proportions 
were made more different by a factor of 1 (e.g., 
56:44 became 57:43). The program then 
continued to analyze the previous 16 trials after 
every block of 10 trials, and titrated difficulty 
ratios accordingly. Following the 60 th trial, the 
difficulty level (proportion of blue and yellow 
greeblies) remained constant throughout the 
remainder of the session. The most difficult 
ratio was limited to 52:48, but there was no limit 
set on the least difficult ratio. 

The relative distribution of reinforcers across 
the two response alternatives was allocated using 
interdependent scheduling (Stubbs & Pliskoff, 
1969), also known as a controlled procedure 
in behavioral signal-detection research (e.g., 
McCarthy & Davison, 1984), to ensure that 
arranged and obtained relative distributions 
were similar. The computer randomly sched- 
uled the next reinforced correct response 
(“more blue” or “more yellow”) according 
to the arranged reinforcer frequency ratio 
(Ri PR 22 ) ■ This varied across the four conditions 
(5:1, 2:1, 1:2, and 1:5). For example, if the 
participant was in the 5:1 condition, they were 
five times more likely to receive reinforcers for 
correctly responding on the left response box 
(“more blue”) following a “more blue” stim- 
ulus presentation (Bn) than for correctly 
responding on the right response box (“more 
yellow”) following a “more yellow” stimulus 
presentation (B 2 2 ). The overall scheduled rate 
of reinforcement across the two response 
alternatives was based on a VI 10-s schedule. 
The VI schedule timer ran through each trial 
(i.e., through the warning stimulus presenta- 
tion, array presentation, the time the participant 
took to respond, and the consequence), and 
only paused at the end of each trial (from the 
presentation of the “next trial” button to when 
the participant clicked on the button). Each 
session ended when the participant reached a 
total of 70 points, or when the participant 
reached the 400 th trial, whichever came first. 

Group B. Group B participants performed a 
similar task to those in Group A. However, 
Group B participants also received occasional 
punishers (point losses) for errors. Screen 4 
was changed accordingly to: 


Screen 4: “Sometimes when you are correct you will 
gain a point. Sometimes nothing happens, you might 
be correct or wrong. Sometimes when you are wrong 
you will lose a point. When you get 60 points, the 
session will end and you can go!” 

Thus, there were three possible consequences 
following each response. Like Part A, partici- 
pants could receive no consequence if neither a 
reinforcer nor punisher was scheduled for that 
particular response (i.e., 1-s blank screen), or a 
reinforcer if they made a correct response (Bn 
or B 22 ) and a reinforcer was scheduled for that 
response (see Group A for details). The third 
consequence occurred if the participant made 
an incorrect response (B 12 or B 2 i) and a 
punisher was scheduled. The statement: “Incor- 
rect! You are one point further from finishing the 
session!” appeared on the center of the screen for 
2 s, accompanied by a 1-s “argh!” sound and the 
thermometer bar. One space of the red bar was 
deleted, showing that the bar had gone down. 
All three consequence types were followed by a 
1-s ITI and the presentation of the “next trial” 
button. Although participants were informed 
that the session ended after 60 points, like 
Group A, the session actually ended when the 
participant had obtained 70 point-gain reinforc- 
ers (irrespective of how many point-loss punish- 
ers they had received) , or when the participant 
reached the 400 th trial, whichever came first. 

Like Group A, the distributions of reinforc- 
ers and punishers were allocated using inter- 
dependent scheduling. The distribution of 
reinforcers was held constant and equal (1:1) 
for Group B; that is, participants received 
equal numbers of reinforcers for correct Bn 
and B 22 responses. For punishers, the com- 
puter program randomly scheduled the next 
incorrect response to be punished according 
to the arranged punisher ratio. For Group B, 
the punisher frequency ratio (P 2 FP 12 ) was 
varied across the four conditions; these were 
5:1, 2:1, 1:2, and 1:5. The overall rate of 
reinforcement across the two response alter- 
natives was based on a VI 10-s schedule, while 
the overall rate of punishment was on a VI 20-s 
schedule. Both VI timers ran during each trial, 
and paused between the presentation of the 
“next trial” button and the participants’ 
response to the button. 

Results and Discussion 
Experimental sessions lasted approximately 
25 to 35 min, with an average of 277 trials 
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completed ( SD = 32.9). The last 120 trials 
from each experimental session were analyzed 
separately for each participant in Groups A 
and B of Experiment 1. For these data, the 
number of left button (“more blue”) respons- 
es following Si (Bn) and S 2 (B 12 ) and right 
button (“more yellow”) responses following 
Si (B 21 ) and S 2 (B 22 ) were calculated. The 
number of reinforcers (point gains) obtained 
for correct responses on each button (Rn and 
R 22 ) and the number of punishers (point 
losses) obtained for errors on each button (P 2 i 
and P 12 ) were also calculated. Measures of 
discriminability (log d, Equation 4) and 
response bias (log b, Equation 5) were 
calculated for each participant from each 
condition (Table 1). 

Figure 4 (top) plots estimates of discrimina- 
bility (log d) across the four reinforcer or 
punisher ratios for each participant in Groups 
A (left) and B (right) of Experiment 1. 
Estimates of discriminability did not significant- 
ly differ across the four conditions for partici- 
pants in Group A, T(3,15) = 1.244, p = .33, or 
Group B, T(3,15) = 1.270, p = .32. This 
independence between discriminability and 
relative reinforcer (Group A) and punisher 
(Group B) frequency was consistent with 
Davison and Tustin’s (1978) model. However, 
a mixed 4 (Condition) X 2 (Group) analysis of 
variance (ANOVA) found that the difference in 
discriminability between the two groups ap- 
proached significance, T( 1,10) = 4A74,p= .06; 
that is, mean discriminability for Group B 
participants (M = .46) was somewhat higher 
than mean discriminability for Group A partic- 
ipants (M = .35). It is possible this was a result 
of participants in Group B receiving more 
feedback than those in Group A. For example, 
Group A participants obtained 70 reinforcers 
(points) in an average of 271 trials; that is, 
about 26% of trials ended with feedback. In 
comparison, Group B participants obtained 70 
reinforcers and an average of 24.5 punishers in 
an average of 282 trials; that is, about 34% of 
their trials ended with feedback (i.e., reinforce- 
ment or punishment). However, it is also 
possible that the sample of participants chosen 
for Group B were better at numerosity judg- 
ments than participants in Group A, irrespec- 
tive of the punisher contingencies. 

Figure 4 (bottom) plots estimates of response 
bias (log b ) across the four reinforcer or punisher 
ratios for each participant in Groups A (left) and 


B (right) of Experiment l 1 . For Group A, 
estimates of response bias differed significantly 
across conditions, /'(3, 1 5) = 13.38, p < .001. 
Individual estimates of sensitivity (i.e., slopes) 
calculated using least squares linear regression 
analyses on the response bias data for each 
participant found positive slopes for 5 of the 6 
participants (M = .36), and a one-sample West 
performed on these slopes confirmed that they 
were significandy greater than zero, 1(5) = 4.355, 
p < .01. In other words, participants in Group A 
were systematically biased towards responding on 
the alternative associated with the higher fre- 
quency of reinforcement. These results were 
consistent with the standard (reinforcement- 
only) version of Davison and Tustin’s (1978) 
model (Equation 5). Furthermore, mean sensi- 
tivity (0.36) was comparable to those obtained in 
previous human detection experiments (e.g., 
Alsop, et al., 1995; Johnstone & Alsop, 1996). 

For Group B, estimates of response bias also 
varied systematically across the four condi- 
tions, T(3,15) = 4.387, p < .05, and least 
squares linear regression analyses performed 
on each participant found negative slopes for 
5 of the 6 participants (M = —.20). A one- 
sample /-test confirmed that estimates of 
sensitivity for Group B participants were 
significantly less than zero, t( 5) = 2.765, p 
<.05; that is, participants in Group B were 
systematically biased away from the response 
alternative associated with the higher frequen- 
cy of point-loss punishment. Although the 
mean slope ( — 0.20) obtained from Group B 
was shallower than that obtained from Group 
A (0.36), this is perhaps not surprising because 
participants in Group B also received equal 
rates of reinforcers (at a higher overall rate 
than the punishers); thus, the effect of the 
reinforcers should attenuate the effect of the 
punishers. The results from Group B were 
consistent with both punishment versions of 
Davison and Tustin’s (1978) signal-detection 
model (Equations 6 to 11; Figure 2, top), 
which predict a negative relation between 
relative punisher frequency and response bias. 
Overall, Experiment 1 demonstrated that 
point gains for correct responses were effective 
reinforcers (Group A) and that point losses for 
errors were effective punishers (Group B) for 
human participants in a detection procedure. 

1 A summary of these data were presented in a short 
theoretical article by Lie & Alsop (2007). 
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varied rfr ratio 
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GJS 
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GROUP B 
varied pun ratio 

• DLG 
— O— HLB 

— ▼ - JA 
— -A— JH 

— ■ - KMC 
— PN 


mean 



1.0 -1.0 -0.5 0.0 0.5 1.0 


log rein forcer ratio (R 11 /R 22 ) log punisher ratio (P 21 /P 12 ) 


Fig. 4. Discriminability (log d - top) and response bias (log b - bottom) are plotted over changes in relative 
reinforcer frequency (log R 11 /R 99 ) for Group A (left) and relative punisher frequency (log P 91 /P 12 ) for Group B (right) 
of Experiment 1. Individual participant data and the overall means are given. 


EXPERIMENT 2 

Experiment 1 established that point gains 
and losses were effective reinforcers and 
punishers respectively, and that the punishers 


had similar but opposite effects to reinforcers 
on human signal-detection performance. Ex- 
periments 2 and 3 examined whether additive 
(e.g., Deluty, 1976) or subtractive (e.g., de 
Villiers, 1980; Farley, 1980) models of punish- 
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ment better model the effects of punishment 
in signal detection. These two competing 
models make different predictions when rela- 
tive reinforcer frequency is varied and a 
constant and equal rate of punishment is 
superimposed on both response alternatives. 
When relative reinforcer frequency is varied in 
the absence of punishment, behavior can be 
described by Davison and Tustin’s (1978) 
GML-based model of signal detection (Equa- 
tion 5, and Figure 2 bottom, solid line). 
However, when a constant and equal rate of 
punishment is also included, the additive 
punishment version of Davison and Tustin’s 
model (Equation 8) predicts a reduced pref- 
erence for the more reinforced response 
alternative (Figure 2 bottom, dotted line), 
while the subtractive punishment version of 
Davison and Tustin’s model (Equation 11) 
predicts an increased preference for the more 
reinforced alternative (Figure 2 bottom, 
dashed line). 

Although no published studies have tested 
the predictions of the two competing models 
using signal-detection procedures, some re- 
search has tested the two models using 
standard concurrent-schedule procedures 
(Critchfield, et al., 2003; de Villiers, 1980; 
Deluty, 1976; Farley, 1980). One approach 
involves arranging a constant and unequal 
distribution of reinforcers across two alterna- 
tives (reinforcer-only [R-only] condition) then 
superimposing a constant and equal distribu- 
tion of punishers (reinforcer + punisher [R+P] 
condition) and measuring preference under 
both condition types. Using this arrangement, 
the subtractive model predicts increased pref- 
erence for the richer (i.e., more reinforced) 
alternative with the inclusion of punishment 
(i.e., greater preference in R+P conditions 
than R-only conditions), while an additive 
model predicts decreased preference for the 
richer alternative (i.e., greater preference in R- 
only conditions than R+P conditions). 

A number of researchers have taken this 
approach. Using pigeons as subjects and 
electric shock punishers, both Farley (1980) 
and de Villiers (1980) found increased prefer- 
ence for the rich alternative in conditions 
where electric shock was superimposed across 
both alternatives (R+P conditions) when com- 
pared to a baseline condition where unequal 
concurrent schedules of reinforcement were 
presented without electric shock (R-only con- 


dition). Critchfield et al. (2003) also found an 
increase in preference for the rich alternative 
when an equal distribution of point-loss 
punishers was superimposed on unequal con- 
current schedules of point-gain reinforcement 
using human participants. Thus, these studies 
unanimously supported a subtractive model of 
punishment over an additive model for con- 
current-schedule performance. 

Experiment 2 used the same perceptual 
discrimination task as Experiment 1 and 
arranged two conditions: an R-only condition 
where the reinforcer ratio was held constant 
and unequal at either 1:5 or 5:1 with no 
punishers for errors, and an R+P condition 
where the same reinforcer ratio was arranged 
but with a 1:1 punisher ratio superimposed. A 
comparison of estimates of response bias 
between R-only and R+P conditions should 
indicate whether an additive or subtractive 
punishment model better describes human 
signal-detection performance. 

Method 

Participants 

Undergraduate students at Victoria Univer- 
sity of Wellington participated as part of an 
optional piece of assessment. There were 8 
males and 8 females aged between 18 and 35 
years (M = 20.1 years). 

Apparatus 

The apparatus and stimuli were similar to 
those used in Experiment 1. However, the 
experiment was conducted in a room approx- 
imately 5 m X 5 m and the task was presented 
on a 43 cm (17") LCD screen. Half the 
participants were presented with 10 X 10 
arrays of blue and yellow greeblies (the same 
as Experiment 1) while the other half were 
presented with 10X10 arrays consisting of red 
and (darker) blue greeblies. (Anecdotal evi- 
dence from the first experiment suggested 
that the yellow greeblies were more salient 
than the blue greeblies). Due to changes in 
screen size and resolution for Experiment 2, 
the 10 X 10 arrays measured approximately 
118 mm wide by 119 mm high, with each 
greeblie measuring approximately 8.5 mm 
wide by 9.0 mm high (i.e., slightly smaller 
than in Experiment 1). All other aspects of the 
apparatus and stimuli were identical to Exper- 
iment 1. 
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Procedure 

The general procedure for Experiment 2 
was similar to Experiment 1. However, there 
were only four condition types in Experiment 
2: two R-only conditions (5:lRand 1:5R) and 
two R+P conditions (5: IP and 1:5P). The R- 
only conditions were identical to the 5:1 and 
1:5 conditions for Group A of Experiment 1; 
that is, correct responses were occasionally 
reinforced and there were no punishers for 
errors. The reinforcer ratio was held con- 
stant at 5:1 (i.e., 5:1R condition) or 1:5 (i.e., 
1:5R condition) throughout the session 
using interdependent scheduling and the 
rate of reinforcement was based on a VI 10- 
s schedule. 

The R+P conditions were similar to Group B 
of Experiment 1; that is, correct responses 
were occasionally reinforced while errors were 
occasionally punished. For Experiment 2 
however, the reinforcer ratio was held constant 
and unequal at 5:1 or 1:5 throughout the 
session, with a constant and equal (1:1) rate of 
point-loss punishers superimposed (5: IP and 
1 :5P conditions respectively) . Like Group B of 
Experiment 1, the rate of reinforcement was 
based on a VI 10-s schedule and the rate of 
punishment was based on a VI 20-s schedule. 
For all conditions, SPP was set at .5, a titration 
procedure was used (see Experiment 1), and 
each session ended after the participant had 
obtained 70 points or reached 400 trials, 
whichever came first. 

Each participant received three experimen- 
tal sessions but was only presented with two 
conditions. Participants received the two con- 
ditions in one of two orders. For Order 1, an R- 
only condition was presented first, followed by 
an R+P condition, then the same R-only 
condition again (i.e., an ABA design). For 
Order 2, an R+P condition was presented first, 
followed by an R-only condition, then the R+P 
condition again (i.e., a BAB design). Partici- 
pants were presented with the same reinforcer 
ratio (i.e., 5:1 or 1:5) and the same stimulus 
type (blue-yellow or blue-red) across all three 
sessions, and this was counterbalanced across 
all participants (Table 2). 

Results and Discussion 

Experimental sessions lasted approximate- 
ly 30 to 40 min, with an average of 338 trials 
completed ( SD = 41.2). The last 120 trials 


from each experimental session were ana- 
lyzed for each participant in the same 
manner as Experiment 1. However, log b 
was calculated for all conditions with the rich 
alternative in the numerator (i.e., positive 
log b values reflected preference for the rich 
alternative). These data are presented in 
Table 2. 

Figure 5 (top) plots estimates of discrimi- 
nability (log d) for each participant who 
received Order 1 (left) or Order 2 (right) 
across the three sessions in Experiment 2. 
Like Experiment 1, estimates of discriminabil- 
ity did not differ significantly across the three 
sessions for participants who sat Order 1, 
T(2,12) = .752, p = .49, or Order 2, T(2,14) = 
2.351, p — -13, although the means (Figure 5, 
solid lines) suggest that estimates of discrim- 
inability were slightly lower for R-only condi- 
tions when compared to R+P conditions, 
consistent with Experiment 1. A 3 (Session) 
X 2 (Order) ANOVA found that mean 
discriminability differed significantly between 
the two condition orders (Order 1 : M = 0.73; 
Order 2: M = 0.99), F(l,13) = 12.46, p < .01, 
indicating that those who received two R+P 
conditions (Order 2) responded more accu- 
rately than those who only received one R+P 
condition (Order 1). Like Experiment 1, this 
could be the result of the increased feedback 
received in R+P conditions, or due to be- 
tween-group differences. 

A cursory examination of Figures 4 (Exper- 
iment 1) and 5 (Experiment 2) finds that 
discriminability estimates from Experiment 2 
appear greater than Experiment 1. Estimates 
of discriminability were averaged across all 
three sessions for each participant in Experi- 
ment 2, and also across the 1:5 and 5:1 
conditions for each participant in Experiment 
1, and a two-sample /-test found a significant 
difference between Experiment 1 ( M = .42) 
and Experiment 2 (M = .86), t( 26) = 6.756, p 
< .001. This is not surprising, however, due to 
the changes in characteristics (i.e., different 
participant pools, changes in computer screen 
and stimulus array sizes, and in some cases, 
changes in stimulus array colors) between the 
two experiments. 

Figure 5 (bottom) plots estimates of re- 
sponse bias (log b) for each participant who 
received Order 1 (left) or Order 2 (right) 
across the three sessions in Experiment 2. A 
cursory examination of the means for Order 1 
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The numbers of B n , B 12 , B 21 , and B 22 responses, R n and R 22 reinforcers, P 21 and P 12 punishers, 
and estimates of discriminability (log d) and response bias (log b) calculated across the last 120 
trials for each participant in each condition of Experiment 2. 


Part. 

Cond. 

Order 

B n 

Bi 2 

b 21 

b 22 

Rn 

R 22 

P 2 i 

Pl2 

l°g d 

log b 

AIS 

1:5R 

i 

42 

18 

ORDER 1 

19 41 

19 

5 

0 

0 

0.70 

0.03 


1:5P 

2 

54 

6 

34 

26 

25 

4 

5 

4 

0.84 

1.07 


1:5R 

3 

48 

11 

33 

28 

22 

5 

0 

0 

0.57 

0.71 

AR 

1:5R 

1 

44 

17 

20 

39 

24 

6 

0 

0 

0.70 

0.12 


1:5P 

2 

53 

7 

24 

36 

25 

5 

3 

5 

1.06 

0.70 


1:5R 

3 

50 

11 

27 

32 

25 

6 

0 

0 

0.73 

0.58 

JM 

1:5R 

1 

52 

10 

28 

32 

18 

6 

0 

0 

0.77 

0.66 


1:5P 

2 

51 

9 

31 

29 

25 

4 

2 

5 

0.72 

0.78 


1:5R 

3 

51 

10 

31 

28 

19 

4 

0 

0 

0.66 

0.75 

MB 

5:1R 

1 

32 

28 

14 

46 

4 

26 

0 

0 

0.57 

0.46 


5:1P 

2 

29 

31 

4 

56 

6 

22 

2 

4 

1.12 

1.18 


5:1R 

3 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

SP 

5:1R 

1 

50 

10 

18 

42 

4 

19 

0 

0 

1.07 

-0.33 


5:1P 

2 

51 

8 

17 

44 

5 

22 

2 

4 

1.22 

-0.39 


5:1R 

3 

31 

29 

11 

49 

3 

21 

0 

0 

0.68 

0.62 

TK 

5:1R 

1 

34 

26 

19 

41 

4 

22 

0 

0 

0.45 

0.22 


5:1P 

2 

34 

24 

11 

51 

5 

26 

7 

4 

0.82 

0.51 


5:1R 

3 

33 

27 

20 

40 

5 

18 

0 

0 

0.39 

0.21 

YH 

1:5R 

1 

42 

18 

27 

33 

21 

5 

0 

0 

0.46 

0.28 


1:5P 

2 

55 

7 

29 

29 

26 

5 

3 

6 

0.90 

0.90 


1:5R 

3 

49 

11 

30 

30 

29 

5 

0 

0 

0.65 

0.65 

YWO 

5:1R 

1 

41 

18 

24 

37 

4 

21 

0 

0 

0.55 

-0.17 


5:1P 

2 

12 

48 

5 

55 

3 

18 

5 

3 

0.44 

1.64 


5:1R 

3 

37 

23 

7 

53 

4 

19 

0 

0 

1.09 

0.67 

AC 

5:1P 

1 

50 

10 

ORDER 2 

25 35 

4 

20 

5 

5 

0.85 

-0.55 


5:1R 

2 

45 

15 

19 

41 

3 

22 

0 

0 

0.81 

-0.14 


5:1P 

3 

45 

15 

3 

57 

6 

27 

3 

1 

1.76 

0.80 

AS 

1:5P 

1 

56 

4 

34 

26 

18 

5 

6 

3 

1.03 

1.26 


1:5R 

2 

49 

11 

32 

28 

19 

4 

0 

0 

0.59 

0.71 


1:5P 

3 

54 

6 

33 

27 

19 

4 

3 

4 

0.87 

1.04 

BD 

1:5P 

1 

56 

5 

12 

47 

23 

5 

2 

5 

1.64 

0.46 


1:5R 

2 

56 

3 

40 

21 

20 

4 

0 

0 

0.99 

1.55 


1:5P 

3 

57 

3 

42 

18 

18 

3 

5 

2 

0.91 

1.65 

CL 

1:5P 

1 

40 

20 

9 

51 

22 

4 

4 

5 

1.05 

-0.45 


1:5R 

2 

43 

16 

24 

37 

20 

4 

0 

0 

0.62 

0.24 


1:5P 

3 

52 

8 

34 

26 

23 

4 

4 

4 

0.70 

0.93 

LC 

1:5P 

1 

57 

3 

31 

29 

20 

4 

3 

2 

1.25 

1.31 


1:5R 

2 

50 

9 

31 

30 

20 

5 

0 

0 

0.73 

0.76 


1:5P 

3 

58 

2 

23 

37 

21 

5 

0 

2 

1.67 

1.26 

RC 

5:1P 

1 

35 

26 

9 

50 

5 

22 

6 

4 

0.87 

0.62 


5:1R 

2 

40 

21 

6 

53 

6 

23 

0 

0 

1.23 

0.67 


5:1P 

3 

28 

32 

3 

57 

7 

23 

3 

2 

1.22 

1.34 

RS 

5:1P 

1 

49 

11 

24 

36 

6 

24 

5 

4 

0.82 

-0.47 


5:1R 

2 

33 

27 

8 

52 

6 

29 

0 

0 

0.90 

0.73 


5:1P 

3 

36 

24 

9 

51 

6 

25 

5 

8 

0.93 

0.58 

TS 

5:1P 

1 

32 

28 

7 

53 

4 

27 

4 

7 

0.94 

0.82 


5:1R 

2 

20 

39 

7 

54 

4 

25 

0 

0 

0.60 

1.18 


5:1P 

3 

17 

43 

3 

57 

4 

21 

3 

1 

0.88 

1.68 
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ORDER 1 


ORDER 2 




condition 


Fig. 5. Discriminability (log d— top) and response bias (log b- bottom) are plotted over the three sessions for Order 
1 (R-only, R+P, R-only - left) and Order 2 (R+P, R-only, R+P - right) of Experiment 2. Individual participant data and the 
overall means are given. 


(Figure 5, bottom left — solid lines) shows an 
increase in response bias from the first session 
(R-only condition, log b = 0.15) to the second 
session (R+P condition, log b = 0.79), with a 
slight decrease on the third session (R-only 
condition, log b = 0.60) . This pattern was fairly 


consistent across 6 of the 8 participants 2 and a 
Friedman test found a significant difference in 

2 Participant MB (Figure 5 left, filled diamonds) com- 
pleted all three sessions but data from the final session was 
lost due to a computer error. However, the increase in 
MB’s response bias from the first to second session was also 
consistent with the mean findings. 


PUNISHMENT AND HUMAN SIGNAL DETECTION 


31 


response bias across the three sessions, = 
7.143, df= 2, p < .05. However, paired sample 
1-tests only found a significant increase from R- 
only to R+P (Sessions 1 to 2), 1(7) = 2.960, p < 
.05. For participants who received Order 2 
(Figure 5, bottom right), there was an increase 
in mean estimates of log b across the three 
sessions (log b = 0.37, 0.71, 1.16), with 6 out of 
8 participants showing an increase from 
Session 1 (R+P) to Session 2 (R-only), and 6 
participants showing an increase from Session 
2 (R-only) to Session 3 (R+P). While the 
difference across the sessions approached 
significance using a Friedman test, = 
5.250, df = 2, p = .07, paired-sample 1-tests 
only found a significant increase in response 
bias from R-only to R+P (Sessions 2 to 3), 1(7) 
= 3.634, p < .01. 

Overall, the results from both orders found 
significant increases in preference (i.e., re- 
sponse bias) from conditions that held the 
reinforcer ratio constant and unequal (5:1 or 
1:5) with no punishment (R-only) to condi- 
tions that superimposed a constant and equal 
rate (1:1) of punishment onto unequal rates of 
reinforcement (R+P). This is consistent with 
the qualitative predictions made by a subtrac- 
tive punishment version of Davison and 
Tustin’s (1978) signal-detection model. It is 
also consistent with the findings from the 
concurrent-schedules literature (Critchfield, et 
al., 2003; de Villiers, 1980; Farley, 1980). 
However, the present experiment only tested 
the directional predictions of the additive 
versus subtractive models of punishment using 
one unequal reinforcer ratio per participant. It 
is unclear whether there would be an increase 
in preference for the rich alternative across a 
number of different unequal reinforcer ratios. 
It is also possible that the increase in prefer- 
ence found in the present experiment only 
occurred due to repeated exposure to the 1:5 
or 5:1 reinforcer ratio. In fact, the significant 
linear trends across the three sessions for 
Order 1, F( 1,7) = 14.82, p < .01, and Order 2, 
F(l,7) = 13.11, p < .01, suggests that this 
might have been the case. However, Johnstone 
and Alsop (1996) found that increased expo- 
sure to a constant and unequal reinforcer ratio 
did not significantly change human response 
bias patterns across four sessions when they 
used a similar detection procedure (albeit 
without punishers for errors). Overall, al- 
though the present experiment found support 


for a subtractive model of punishment, a 
larger study arranging several reinforcer ratios 
was needed. 

EXPERIMENT 3 

Like the previous experiment, Experiment 3 
also tested the predictions of the additive versus 
subtractive models of detection performance. 
However, Experiment 3 arranged four different 
unequal reinforcer ratios (5:1, 2:1, 1:2, 1:5), 
both without (R-only) and with (R+P) a 
constant and equal (1:1) rate of punishment 
for errors. This approach has been taken by 
Critchfield et al. (2003) with human partici- 
pants and Farley (1980) with pigeons using 
concurrent-schedule procedures. Both Critch- 
field et al. and Farley presented their subjects 
with conditions which varied the relative 
frequency of reinforcers across the two alterna- 
tives (Critchfield, et al.: from 7:1 to 1:7; Farley: 
from 4:1 to 1:6) with and without a constant 
and equal (1:1) rate of punishment superim- 
posed across both alternatives. When estimates 
of sensitivity (a, Equation 1) were compared 
between reinforcer-only and reinforcer + pun- 
isher conditions, both studies found increased 
sensitivity with the inclusion of punishment, 
consistent with a subtractive model of punish- 
ment. If an increase in sensitivity is found in the 
present experiment, this would support a 
subtractive punishment model of detection 
performance (Figure 2 bottom, dashed line), 
consistent with the findings from concurrent- 
schedule procedures that have arranged similar 
conditions (Critchfield, et al., 2003; Farley, 
1980) and also the detection procedure used 
in Experiment 2. 

Method 

Participants 

Eight university students were recruited 
from a job recruitment agency for students at 
the University of Otago. Each participant 
received $80NZ after the completion of their 
eighth and final session. There were 3 males 
and 5 females aged between 19 and 24 years 
(M = 21.3 years). 

Apparatus 

The experiment was conducted in the same 
room as Experiment 1 and the task was 
presented on a 43 cm (17") LCD monitor. 
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Table 3 

The numbers of Bn, B 12 , B 21 , and B 22 responses, Rn and R 2 2 reinforcers, P 21 and P 12 punishers, 
and estimates of discriminability (log d) and response bias (log b) calculated across the last 120 
trials for each participant in each condition of Experiment 3. 


Part. 

Cond. 

Order 

Bn 

B12 

B 2 i 

b 22 

Rn 

r 22 

P 21 

Pi 2 

log d 

log b 

CB 

5:1R 

4 

43 

18 

17 

42 

25 

5 

0 

0 

0.39 

-0.01 


2:1R 

2 

32 

28 

25 

35 

13 

6 

0 

0 

0.10 

-0.04 


1:2R 

3 

45 

20 

9 

56 

15 

24 

0 

0 

0.57 

-0.22 


1:5R 

1 

34 

27 

18 

41 

4 

26 

0 

0 

0.23 

-0.13 


5:1P 

8 

31 

29 

17 

43 

17 

5 

7 

8 

0.22 

-0.19 


2:1P 

6 

29 

31 

10 

50 

15 

8 

7 

7 

0.34 

-0.36 


1:2P 

7 

46 

14 

3 

57 

12 

22 

3 

5 

0.90 

-0.38 


1:5P 

5 

28 

32 

5 

55 

5 

28 

3 

7 

0.49 

-0.55 

CM 

5:1R 

3 

39 

20 

20 

41 

22 

3 

0 

0 

0.30 

-0.01 


2:1R 

1 

45 

16 

28 

31 

18 

9 

0 

0 

0.25 

0.20 


1:2R 

4 

50 

10 

19 

41 

8 

19 

0 

0 

0.52 

0.18 


1:5R 

2 

32 

28 

17 

43 

4 

21 

0 

0 

0.23 

-0.17 


5:1P 

7 

32 

28 

8 

52 

20 

5 

7 

2 

0.44 

-0.38 


2:1P 

5 

39 

21 

20 

40 

18 

9 

8 

6 

0.28 

-0.02 


1:2P 

8 

28 

33 

8 

51 

8 

17 

5 

1 

0.37 

-0.44 


1:5P 

6 

32 

27 

14 

47 

5 

24 

6 

10 

0.30 

-0.23 

CY 

5:1R 

5 

49 

11 

23 

37 

23 

6 

0 

0 

0.43 

0.22 


2:1R 

7 

40 

20 

13 

47 

20 

8 

0 

0 

0.43 

-0.13 


1:2R 

6 

44 

16 

15 

45 

11 

18 

0 

0 

0.46 

-0.02 


1:5R 

8 

26 

33 

11 

50 

4 

23 

0 

0 

0.28 

-0.38 


5:1P 

1 

51 

9 

32 

28 

22 

4 

7 

8 

0.35 

0.41 


2:1P 

3 

44 

17 

18 

41 

16 

9 

7 

6 

0.39 

0.03 


1:2P 

2 

42 

18 

17 

43 

9 

17 

8 

8 

0.39 

-0.02 


1:5P 

4 

27 

35 

0 

58 

4 

23 

0 

0 

0.98 

-1.09 

DK 

5:1R 

7 

46 

14 

26 

34 

22 

6 

0 

0 

0.32 

0.20 


2:1R 

5 

49 

11 

24 

36 

22 

9 

0 

0 

0.41 

0.24 


1:2R 

8 

32 

28 

13 

47 

9 

16 

0 

0 

0.31 

-0.25 


1:5R 

6 

31 

29 

7 

53 

5 

24 

0 

0 

0.45 

-0.43 


5:1P 

3 

52 

8 

24 

36 

25 

5 

6 

4 

0.49 

0.32 


2:1P 

1 

40 

21 

22 

37 

21 

10 

11 

11 

0.25 

0.03 


1:2P 

4 

38 

22 

9 

51 

10 

20 

5 

5 

0.50 

-0.26 


1:5P 

2 

26 

34 

4 

56 

6 

30 

3 

2 

0.51 

-0.63 

.JF 

5:1R 

6 

51 

8 

24 

37 

26 

6 

0 

0 

0.50 

0.31 


2:1R 

8 

53 

8 

10 

49 

22 

12 

0 

0 

0.76 

0.07 


1:2R 

5 

47 

13 

14 

46 

9 

21 

0 

0 

0.54 

0.02 


1:5R 

7 

42 

16 

10 

52 

6 

25 

0 

0 

0.57 

-0.15 


5:1P 

2 

59 

1 

42 

18 

25 

4 

0 

1 

0.70 

1.07 


2:1P 

4 

54 

6 

21 

39 

24 

12 

5 

3 

0.61 

0.34 


1:2P 

1 

55 

6 

18 

41 

10 

22 

6 

5 

0.66 

0.30 


1:5P 

3 

50 

9 

19 

42 

5 

25 

6 

7 

0.54 

0.20 

LSS 

5:1R 

8 

51 

10 

13 

46 

24 

5 

0 

0 

0.63 

0.08 


2:1R 

6 

45 

15 

12 

48 

19 

9 

0 

0 

0.54 

-0.06 


1:2R 

7 

45 

15 

13 

47 

11 

20 

0 

0 

0.52 

-0.04 


1:5R 

5 

39 

20 

13 

48 

5 

26 

0 

0 

0.43 

-0.14 


5:1P 

4 

47 

12 

20 

41 

26 

6 

8 

7 

0.45 

0.14 


2:1P 

2 

52 

8 

21 

39 

21 

11 

6 

6 

0.54 

0.27 


1:2P 

3 

52 

9 

19 

40 

9 

21 

4 

6 

0.54 

0.22 


1:5P 

1 

30 

29 

9 

52 

6 

22 

8 

6 

0.39 

-0.37 
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Table 3 
( Continued , ) 


Part. 

Cond. 

Order 

Bn 

Bi 2 

B 21 

b 22 

Rn 

r 22 

P 2 i 

Pl2 

log d 

log b 

MK 

5:1R 

2 

42 

18 

26 

34 

22 

4 

0 

0 

0.24 

0.13 


2:1R 

4 

26 

23 

20 

31 

11 

8 

0 

0 

0.12 

- 0.07 


1:2R 

1 

30 

30 

19 

41 

8 

20 

0 

0 

0.17 

- 0.17 


1:5R 

3 

21 

39 

11 

49 

6 

26 

0 

0 

0.19 

- 0.46 


5:1P 

6 

32 

28 

26 

34 

24 

4 

11 

8 

0.09 

- 0.03 


2:1P 

8 

37 

23 

18 

42 

21 

10 

9 

10 

0.29 

- 0.08 


1:2P 

5 

22 

38 

8 

52 

10 

20 

4 

7 

0.29 

- 0.53 


1:5P 

7 

30 

31 

8 

51 

5 

24 

6 

3 

0.40 

- 0.41 

NC 

5:1R 

1 

56 

3 

40 

21 

22 

5 

0 

0 

0.50 

0.78 


2:1R 

3 

51 

10 

18 

41 

22 

7 

0 

0 

0.53 

0.18 


1:2R 

2 

38 

20 

13 

49 

8 

18 

0 

0 

0.43 

- 0.15 


1:5R 

4 

34 

25 

18 

44 

4 

25 

0 

0 

0.26 

- 0.13 


5:1P 

5 

49 

11 

23 

37 

24 

5 

5 

6 

0.43 

0.22 


2:1P 

7 

51 

9 

35 

25 

13 

9 

4 

4 

0.30 

0.45 


1:2P 

6 

53 

7 

27 

33 

6 

16 

3 

6 

0.48 

0.40 


1:5P 

8 

43 

17 

16 

44 

4 

24 

5 

7 

0.42 

- 0.02 

The stimuli were 12 X 12 arrays (115 

mm wide 

Another four conditions also varied the 

X 125 

mm high) in 

the center of the white 

reinforcer 

ratio (5:1, 2:1, 1:2, 

and 

1:5), but 

screen, 

with each position of the array occu- 

included punishment for 

errors 

(similar to the 

pied by either a blue or a 

red “j 

greeblie” 

R+P conditions in 

Experiment 2). These R+P 

(measuring 8 mm wide and 9 mm high) 

conditions 

were labeled 5:1P, 2:1P, 1:2P, and 


against a white background. “More blue” 
stimuli consisted of 75 random array positions 
filled with blue greeblies and 69 random array 
positions filled with red greeblies. “More red” 
stimuli contained 75 random array positions 
filled with red greeblies and 69 random array 
positions filled with blue greeblies. Partici- 
pants responded on a two-key response panel 
(with telegraph Morse keys) connected to the 
computer’s USB port via a Lab Jack™ inter- 
face device. Beside the left key was a picture of 
a blue greeblie (indicating the response for 
“more blue”), and beside the right key was a 
picture of a red greeblie (indicating the 
response for “more red”). 

Procedure 

There were eight conditions in Experiment 
3. Four conditions varied the reinforcer ratio 
without punishment for errors (similar to the 
R-only conditions in Experiment 2); the four 
ratios used were 5:1, 2:1, 1:2, and 1:5. These R- 
only conditions were labeled 5:1R, 2:1R, 1:2R, 
and 1:5R, respectively (Table 3). The distribu- 
tion of reinforcers was varied using interde- 
pendent scheduling with the overall rate of 
reinforcement based on a VI 10-s schedule. 


1:5P, respectively (Table 3). The distribution 
of punishers was held constant and equal (1:1) 
using interdependent scheduling, with overall 
rates of reinforcement and punishment based 
on VI 10-s schedules. For all conditions, SPP 
was set at .5. The difficulty levels for each 
condition were not titrated (i.e., all stimuli 
contained 75 greeblies of one color and 69 
greeblies of the other). 

Participants received one condition per 
session. Sessions were conducted no less than 
24 hours apart and no more than one week 
apart. The presentation order of the condi- 
tions was partially counterbalanced across all 
the participants, with the constraints that all 
four R-only conditions and all four R+P 
conditions were sat consecutively, and that 
no two consecutive conditions arranged the 
greater reinforcer frequency on the same key 
(Table 3). 

The general procedure and instructions 
were similar to those presented in Experiment 
2. However, some additional feedback was 
presented during the consequence screens 
corresponding to the response made. If the 
participant made a correct left key (“more 
blue”) response and a reinforcer was sched- 
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uled for that response, a blue check (/) 
appeared in the bottom left hand corner of 
the screen. Likewise, if the participant made a 
correct right key (“more red”) response and a 
reinforcer was scheduled, a red check ap- 
peared in the bottom right hand corner of the 
screen. For the R+P conditions, additional 
feedback was also presented when participants 
obtained a punisher. If the participant made 
an incorrect “more blue” or “more red” 
response and a punisher was scheduled for 
that response, a blue or red picture of a “X” 
appeared at the bottom of the screen, on the 
side corresponding to the response key they 
just pressed. For the R-only conditions, partic- 
ipants reached 70 points or 50 min to finish 
each session. For the R+P conditions, partici- 
pants reached 50 points (net) or 50 min to 
finish each session. 

Results and Discussion 

Experimental sessions lasted approximately 
30 to 40 min, with an average of 338 trails 
completed ( SO = 75.0). The last 120 trials 
from each experimental condition were 
analyzed for each participant in the same 
manner as Experiment 1. Participant CY 
made zero B 21 responses in the last 120 trials 
of Condition 1:5P; thus, a correction was 
made with B 2 i = 0.5 for calculations of log d 
and log b for that particular participant in 
that condition. These results are presented in 
Table 3. 

Figure 6 plots estimates of log d (top) and log 
b (bottom) for each participant across the four 
relative reinforcer frequency variations (5:1, 
2:1, 1:2, and 1:5) for the R-only (left) and R+P 
(right) conditions. Estimates of discriminability 
were more variable in Experiment 3 compared 
to those obtained in Experiments 1 and 2; this 
is most likely because task difficulty was not 
titrated in the present experiment. Mean 
discriminability did not differ systematically 
across the R-only conditions, T(3,21) = 1.319, 
p = .30, or R+P conditions however, F(3,21) = 
1.536, p = .24, so the absence of a titration 
procedure appeared not to affect the overall 
findings. Again, this independence between 
discriminability and relative reinforcer frequen- 
cy variations is consistent with Davison and 
Tustin’s (1978) model of signal detection. Like 
Experiments 1 and 2, mean discriminability was 
slightly higher across R+P conditions (M = 
0.45) than R-only conditions (M = 0.35), but a 


4 (Reinforcer Ratio) X 2 (Condition Type) 
ANOVA found no significant effect of condi- 
tion type, F(l,14) = .824, p = .38. 

Figure 6 (bottom) shows that estimates of 
response bias were more variable for R+P 
conditions than R-only conditions. For the R- 
only conditions, estimates of response bias 
differed significantly across reinforcer ratios, 
T(3,21) = 10.74, p < .001. As with Experiment 
1, individual estimates of sensitivity were 
calculated from each participant’s response 
bias data for R-only conditions using least 
squares linear regression analyses. Positive 
slopes were found across all participants ( M 
= 0.31), and a one-sample /-test found that 
these were significantly greater than zero, Z(7) 
= 4.578, p < .01. Thus, participants were 
systematically biased towards the alternative 
associated with the higher rate of reinforce- 
ment. The mean slope was similar to the slope 
obtained with Group A of Experiment 1 
(Figure 3, M = 0.36) which arranged similar 
conditions, and also with previous human 
detection research; for example, mean slope 
ranged between 0.33 to 0.36 across experi- 
ments in Alsop et al.’s (1995) study. 

For the R+P conditions, estimates of re- 
sponse bias also differed significantly across 
reinforcer ratios, T(3,21) = 7.633, p < .01. 
Individual estimates of sensitivity were calcu- 
lated from each participant’s response bias 
data for R+P conditions using least squares 
linear regression analyses. Again, positive 
slopes were found across all participants (M 
= 0.40) with the possible exception of a 
negligible slope obtained by CM. A one-sample 
/-test also found that these slopes were 
significantly greater than zero, t( 7) = 3.860, 
p < .01. Thus, on average, participants showed 
a greater preference towards the alternative 
associated with the higher rate of reinforce- 
ment when errors were punished occasionally 
than when they were not. This increase in 
mean sensitivity from reinforcer-only to rein- 
forcer + punisher conditions is more consis- 
tent with a subtractive model of punishment 
than an additive model. 

A closer analysis of the individual response 
bias data was performed to see if within-subject 
changes in sensitivity were consistent with the 
mean increase from R-only to R+P conditions. 
Figure 7 shows each individuals’ response bias 
data across the four relative reinforcer frequency 
variations, and also displays the results from least 
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R-ONLY conditions 


R + P conditions 





log reinforcer ratio (R n /R 22 ) 


Fig. 6. Discriminability (log d - top) and response bias (log b - bottom) are plotted over changes in relative 
reinforcer frequency (log R 11 /R 22 ) for the reinforcer-only conditions (left) and the reinforcer + punisher conditions 
(right) in Experiment 3. Individual participant data and the overall means are given. 


squares linear regression analyses performed on 
each participant for both condition types for 
Experiment 3. Five participants (CB, CY, DK, JF, 
and LSS) showed increases in sensitivity from R- 
only conditions (filled circles) to R+P conditions 
(unfilled triangles), with reasonably good re- 
gression fits (M = .79). Flowever, 3 participants 
(NC, CM, and MK) showed decreases in 
sensitivity from R-only to R+P conditions, but 
regression fits were quite poor (i.e., close to 
zero) for 2 of the 3 participants (NC, CM) ; only 1 
participant (MK) showed a decrease in sensitivity 
with good regression fits. Again, the increases in 
sensitivity favor a subtractive punishment model, 


consistent with findings from the concurrent- 
schedules literature (e.g., Critchfield, et al., 
2003; Farley, 1980). 

There was, however, some evidence that the 
order in which participants received the R-only 
and R+P conditions affected performance. 
Participants CB, CM, MK, and NC received 
all four R-only conditions followed by the R+P 
conditions (Figure 7, left panels), with 3 
participants showing lower sensitivity in R-only 
than R+P conditions (albeit 2 with poor 
regression fits). Mean sensitivity across these 
4 participants was 0.31 for R-only conditions 
and 0.19 for R+P conditions. In contrast, all 4 
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Fig. 7 . Response bias (log b) is plotted over changes in 
relative reinforcer frequency (log R11/R22) separately for 
each participant in Experiment 3 . Reinforcer-only condi- 
tions are presented as filled circles, while reinforcer + 
punisher conditions are presented as unfilled triangles. 
Results from least squares linear regression analyses are 
also presented for each participant for R-only and R+P 
conditions separately. 


participants who received R+P conditions 
followed by R-only conditions (CY, DK, JF, 
and LSS — Figure 7, right) showed greater 
sensitivity with the inclusion of punishment 
(along with reasonable regression fits). Mean 
sensitivity for these participants was 0.31 for R- 
only conditions, and 0.61 for R+P conditions. 

It is unclear why an order effect was found in 
the present experiment. However, a number 
of reasons were explored. First, it is possible 


that sensitivity to the reinforcer ratio de- 
creased over the course of the experiment. 
In both cases, participants obtained lower 
estimates of sensitivity for the second condi- 
tion type compared to the first condition type 
(0.31 to 0.19 for one group, 0.61 to 0.31 for 
the other). However, mean estimates of 
sensitivity for the R-only conditions were 
identical for both groups, and also consistent 
with Group A of Experiment 1 (a = 0.36) and 
previous research (e.g., Alsop et al., 1995); this 
consistency argues against a general overall 
decrease in sensitivity. 

Second, the difference in sensitivities was 
perhaps related to differences in discrimina- 
bility. Figure 6 (top) shows that participants 
who received R+P conditions first had higher 
estimates of discriminability than those who 
received the R+P conditions second; however, 
this difference was not significant. Further- 
more, no significant correlations were found 
between estimates of sensitivity and estimates 
of discriminability for the R-only conditions (r 
= —.05, n = 8, p = .92) or the R+P conditions 
(r = .47, n = 8, p = .23). 

Finally, previous concurrent schedule re- 
search (e.g., Alsop & Elliffe, 1988; Logue & 
Chavarro, 1987) has found that increases in 
overall reinforcer rate increased sensitivity. It is 
possible that changes in sensitivity in the 
present experiment were related to overall 
reinforcer or punisher rates. To investigate 
this, overall reinforcer and punisher rates were 
calculated for each condition, and 4 (Rein- 
forcer Ratio) X 2 (Order) ANOVAs were 
performed on the R-only and R+P conditions 
separately. No significant difference in overall 
punisher rates was found for the R+P condi- 
tions, T(l,6) = 0.062, p = .81, however, the 
differences in overall reinforcer rates ap- 
proached significance for the R+P conditions, 
T(l,6) = 5.280, p = .06, and was significant for 
the R-only conditions, T(l,6) = 12.30, p < .05. 
In both cases, participants who were presented 
with R-only conditions second received higher 
rates of reinforcement (R+P: M = 2.74 
reinforcers/min; R-only: M = 2.96 reinforc- 
ers/min) than those who were presented with 
the R-only conditions first (R+P: M = 2.55 
reinforcers/min; R-only: M = 2.63 rfrs/min). 
However, it seems unlikely that such a small 
difference in reinforcer rate (R+P = 0.19 
reinforcers/min; R-only = 0.33 reinforcers/ 
min) was sufficient to affect sensitivity, partic- 
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ularly since Alsop and Elliffe varied reinforcer 
rates between 0.22 and 10 reinforcers per 
minute to demonstrate an effect. 

GENERAL DISCUSSION 

The present experiments demonstrated that 
point-loss punishers for errors influenced 
human performance on detection tasks. Vary- 
ing the relative frequency of point-loss pun- 
ishment systematically biased participants away 
from responding on the alternative associated 
with the higher rate of punishment (Group B 
- Experiment 1). This was consistent with the 
predictions made by both additive and sub- 
tractive punishment versions of Davison and 
Tustin’s (1978) GML-based model of signal 
detection (Figure 2, top). These results were 
also parallel but opposite to the effects of 
reinforcing correct responses with point gains 
(Group A - Experiment 1), which systemati- 
cally biased participants towards the alterna- 
tive associated with the higher rate of rein- 
forcement; this was consistent with previous 
human detection research (Alsop, et al., 1995; 
Johnstone & Alsop, 1996). 

The results from Experiments 2 and 3 found 
that point-loss punishers also had an effect on 
preference for the more reinforced alterna- 
tive. In both experiments, there was some 
evidence for increases in preference for the 
more reinforced alternative when a constant 
and equal rate of punishment was superim- 
posed onto two response alternatives. Howev- 
er, order effects were found in both experi- 
ments. In Experiment 2, a general increase in 
sensitivity across the three sessions cannot be 
ruled out, although some reversal of the 
effects of punishment was found with Order 
1 (R-only, R+P, R-only), and significant in- 
creases were found from R-only to R+P 
conditions. In Experiment 3, although 5 of 
the 8 participants obtained higher sensitivity 
estimates for R+P conditions than R-only 
conditions (consistent with a subtractive mod- 
el of punishment), 4 of the 5 participants 
received R+P conditions first followed by R- 
only conditions. While previous researchers 
(Critchfield, et al., 2003; Farley, 1980) pre- 
sented R-only conditions first followed by R+P 
conditions, 3 of the 4 participants in Experi- 
ment 3 (of the present set of studies) who 
received conditions in this order showed 
decreases in sensitivity; that is, the opposite 


finding to previous studies. Thus, it appears 
that condition order may also play some part 
in the effects of punishment on sensitivity. The 
effects of condition order on detection and 
choice task performance may warrant further 
investigation. 

Together, the data from Experiments 2 and 
3 provide greater support for a subtractive 
punishment version over an additive punish- 
ment version of Davison and Tustin’s (1978) 
GML-based model of signal detection perfor- 
mance (Figure 2). This result is consistent with 
findings using concurrent-schedule choice 
procedures, where there is overwhelmingly 
more support for a subtractive model of 
punishment (e.g., Critchfield, et al., 2003; de 
Villiers, 1980; Farley, 1980) over an additive 
model of punishment (Deluty, 1976). In fact, 
only Deluty (1976; but see also Deluty, 1982; 
Deluty & Church, 1978) has claimed support 
for an additive model of punishment. A closer 
look at Deluty’s (1976) experiment however, 
shows that the conditions he ran were not 
adequate to directly compare the additive and 
subtractive models. A reanalysis of Deluty’s 
data by de Villiers (1980) showed that the 
subtractive model accounted for a similar 
amount of the variance in Deluty’s data as 
the additive model; that is, both models made 
nearly identical predictions for Deluty’s con- 
ditions. Thus, there appears to be very little 
support for an additive model of punishment, 
and the findings from the present experiments 
extend the support for the subtractive model 
of punishment beyond that of the simple 
concurrent-schedule choice procedure to the 
signal-detection choice procedure. 

While Davison and Tustin’s (1978) model 
appeared to capture the effects that punish- 
ment had on the participants’ behavior quite 
well, Alsop and Davison (Alsop, 1991; Alsop & 
Davison, 1991; Davison, 1991) and Davison 
and Nevin (1999) have proposed a competing 
model of detection performance based on the 
discriminability (or confusability) of stimulus- 
response and response-reinforcer contingen- 
cies. The contingency-discriminability model 
addresses the lack of parameter invariance 
sometimes found with Davison and Tustin’s 
model by using two independent parameters - 
one which measures the discriminability be- 
tween the stimulus-response contingency 
(termed d s or d sr ) and another which measures 
the discriminability between the response- 
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reinforcer contingency (termed d r or d„) . 
However, the independence of d s and d, has 
also received mixed support, with some studies 
finding an interaction between d s and d, and 
others finding no relation (see Alsop & Porritt, 
2006 for a summary). Furthermore, it is 
unclear how punishers should be incorporated 
into the contingency-discriminability model. 
For example, will the discrimination of re- 
sponse-reinforcer and response-punisher con- 
tingencies require the same parameter or 
separate parameters? Likewise, is discrimina- 
bility between the stimulus-response contin- 
gencies similar or different following a rein- 
forcer or a punisher? Even with the 
assumption that d s and d, are identical for 
reinforcers and punishers, the simplest addi- 
tive and subtractive punishment versions of 
the Alsop-Davison-Nevin model make similar 
predictions to the additive and subtractive 
versions of the Davison and Tustin model 
(Figure 2). As it currently stands, the integra- 
tion of reinforcer and punisher effects in 
detection models appears less complex with 
the GML-based Davison and Tustin model 
compared to the Alsop-Davison-Nevin contin- 
gency-discriminability model. 

The present experiments found an inde- 
pendence between estimates of discriminabil- 
ity and changes to the reinforcer or punisher 
contingencies, consistent with the parameter 
invariance assumption from the Davison and 
Tustin (1978) model. However, there was 
some evidence that discriminability was higher 
in conditions where punishment for errors was 
included (R+P conditions) than conditions 
with no punishment (R-only conditions; Ex- 
periments 1 and 2); the additive or subtractive 
versions of the Davison and Tustin model 
(Equations 6 to 11) do not predict this 
finding. It is possible that punishers improve 
discriminability by altering motivation or 
attention. A recent model proposed by Nevin, 
Davison, and Shahan (2005) integrates the 
Alsop-Davison-Nevin (Alsop & Davison, 1991; 
Davison & Nevin, 1999) contingency-discrimi- 
nability model with a theory of attending. 
Nevin et al. proposed that the probability of 
attending to the sample stimuli (S| and S 2 ) 
and comparison stimuli (termed Ci and C 2 for 
the stimuli signaling the Bj and B 2 responses, 
respectively) in a detection task is positively 
related to the reinforcer rate, in a manner 
similar to behavioral momentum theory (Ne- 


vin & Grace, 2000). Although Nevin et al.’s 
model deals explicitly with the effects of 
reinforcement on attention, it is unclear how 
the effects of punishers should be integrated 
into such model. Does the inclusion of 
punishment for errors increase the probability 
of attending to the sample and/or comparison 
stimuli? If so, does the increase in discrimina- 
bility (log d) found for R+P conditions in the 
present experiments imply that reinforcement 
and punishment combine additively to in- 
crease the probability of attending beyond 
the effects of reinforcement alone? This might 
be a challenge for any model based on the 
reinforcer effects encompassed by behavioral 
momentum. Given how little is known about 
the effects of punishment on attention, this 
may be a worthwhile direction for future 
research. 

The present series of experiments is the 
first systematic investigation of the effects of 
punishment on human signal-detection per- 
formance, and there are some limitations with 
areas for improvement. For example, due to 
time limitations and monetary constraints, 
participants only received one session per 
condition while previous studies of choice 
and punishment (e.g., Critchfield, et al., 
2003; Farley, 1980) arranged a number of 
sessions per condition. A larger range of ratios 
may have also been better suited to the 
differing additive and subtractive model pre- 
dictions, as the deviations from linearity are 
predicted by the subtractive model at extreme 
ratios (Figure 2, dashed line). Future direc- 
tions for research may include studying other 
punisher types (e.g., response effort, time- 
out), and comparisons between human and 
nonhuman detection performance. Because 
punishers are real consequences in many 
everyday situations (e.g., quality control and 
medical screening both have positive conse- 
quences for correct choices and negative 
consequences for errors), research on the 
interaction between reinforcement and pun- 
ishment is thus important on both theoretical 
and applied grounds. 
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